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Abstract: Clustering techniques are mostly 
unsupervised methods that can be used to organize 
data into groups based on similarities among the in- 
dividual data items. Fuzzy c-means (FCM) clustering 
is one of well known unsupervised clustering tech- 
niques, which can also be used for unsupervised web 
document clustering. In this chapter we will intro- 
duce a modified method of clustering where the data 
to be clustered will be represented by graphs instead 
of vectors or other models. Specifically, we will ex- 
tend the classical FCM clustering algorithm to work 
with graphs that represent web documents (Phukon, 
K. K. (2012), Zadeh, L. A. (1965). Dunn, J. 
C.(1974)). We wish to use graphs because they can 
allow us to retain information which is often discard- 
ed in simpler models. 

Keywords: Graph, Web Document, Hard Par- 
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1. INTRODUCTION 


Fuzzy clustering is well-known not 
only in fuzzy community, but also in the re- 
lated fields of data analysis, neural net- 
works, and other areas in computational in- 
telligence. The FCM algorithm, proposed by 
Dunn, J. C. (1974) and extended by Bezdek, J. 
C. (1981), Cannon, R. L., Dave, J. V., 
Bezdek, J. C. (1986) can be applied if the 
objects of interest are represented as points 
in a multi-dimensional space. FCM relates 
the concept of object similarity to spatial 
closeness and finds cluster centers as proto- 
types. Several examples of application of 
FCM to real clustering problems have 
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proved the good characteristics of this algo- 
rithm with respect to stability and partition 
quality. 

In general, cluster analysis refers to a 
broad spectrum of methods which try to 
subdivide a data set X into c subsets (clus- 
ters) which are pair wise disjoint, all 
nonempty, and reproduce X via union. The 
clusters then are termed a hard (1.e., non- 
fuzzy) c-partition of X. A significant fact 
about this type of algorithm is the defect in 
the underlying axiomatic model that each 
point in X is unequivocally grouped with 
other members of its cluster, and thus bears 
no apparent similarity to other members of 
X. One such manner to characterize an indi- 
vidual point's similarity to all the clusters 
was introduced in 1965 by Zadeh. The key 
to Zadeh's idea (Zadeh, L. A. (1965)) is to 
represent the similarity a point shares with 
each cluster with a function (termed the 
membership function) whose values (called 
memberships) are between zero and one. 
Baruah (2011) has defined the membership 
function of a normal fuzzy number N= 


[a, By] as 


D(x) fasxsP, 
Hy (x) 4 ®D, (x) if Paxsy, 
0 otherwise. 


(Eq: 1.1) 
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Where ®,(x) and (1—@®,(x)) are two 
independent distribution functions defined in 
la, 8|and [8,7] respectively. 

Clustering techniques are generally 
applied to data that are quantitative (numeri- 
cal), qualitative (categorical), or a mixture of 
both. But in this chapter we are going to put 
forward a means for clustering graphical ob- 
jects with the help of FCM algorithm. Let us 
start with quantitative data where each ob- 
servation may consists of n measured varia- 
bles, grouped into an n-dimensional column 
vector Zr= [ziz, ... ,Znkl’, ZeEU ". A set of 
N observations is denoted by Z= {z | k = 1, 
2,...,V/, and is represented as ann x N ma- 
trix: 














Z 21: z 

Z a2 = ny 
i= 

Z Zn wk 





In the pattern-recognition terminology, 
the columns of this matrix are called pat- 
terns or objects, the rows are called the fea- 
tures or attributes, and Z is called the pattern 
or data matrix. The meaning of the columns 
and rows of Z depends on the context. 


2. HARD AND FUZZY PARTITIONS 


Hard clustering methods are based on 
classical set theory, and require that an ob- 
ject either does or does not belong to a clus- 
ter. Hard clustering means partitioning the 
data into a specified number of mutually ex- 
clusive subsets. Fuzzy clustering methods, 
however, allow the objects to belong to sev- 
eral clusters simultaneously, with different 
degrees of membership. In many situations, 
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fuzzy clustering is more natural than hard 
clustering. Objects on the boundaries be- 
tween several classes are not forced to fully 
belong to one of the classes, but rather are 
assigned membership degrees between O and 
1 indicating their partial membership. 


2.1. Hard Partition 


The objective of clustering is to parti- 
tion the data set Z into c clusters (groups, 
classes).Using classical sets, a hard partition 
of Z can be defined as a family of sub- 
sets{Aill <i <c} CP(Z), ( P(Z) is the power 
set of Z) with the following properties 
(Bezdek, 1981): 


LJ 4,=Z 
i=] 


GCA, CZ,1<i<ec 


(Eq: 2.1.1, 2.1.2 & 2.1.3 respectively.) 


Equation (2.1.1) means that the union 
subsets Aj contain all the data. The subsets 
must be disjoint, as stated by (2.1.2), and 
none of them is empty nor contains all 
thedata in Z (2.1.3). In terms of member- 
ship(characteristic) functions, a partition can 
be conveniently represented by the partition 
matrix U =[ 4, Jexw. The in row of this ma- 
trix contains values of the membership func- 
tion wz, of the i subset A; of Z. It follows 


from the above equations that the elements 
of U must satisfy the following conditions: 
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HM, E10, 1, lsisze, LSkEN, 


Yue =, I<K<N, 


fm] 
N 
ba) ja, Pere: 
Eml 
(Eq: 2.2.1, 2.2.2 & 2.2.3 respectively.) 
The space of all possible hard partition 


matrices for Z, called the hard partitioning 
space (Bezdek, 1981), is thus defined by: 








M,,={U «0° 


( Eq: 2.3) 


Example 1.1 Hard partition:Let us 
illustrate the concept of hard partition by a 
simpleexample. Consider a data set Z = {Z1, 
Z2,...,Z10/, consisting of 10 web pages each 
represented by graphs. Suppose we obtained 
the figure below after calculating the dis- 
tance[2,3] between each and every pair of 
graphs by using the formula: 


Ze (MCS (z;,Z;)) 
ae (z,). Ca (z,))) 





AiStycs (Z;.Z;) =1- 


wherei,j=1,2...10 
(Eq: 2.4) 


as shown in Figure below: 





Figure 1.1. A dataset in [ ° 
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A visual inspection of this data may 
suggest two well-separated clusters (data 
points zZ; to Z4 and Z7 to Zio respectively), one 
point in between the two clusters (Zs), and 
an“outlier” Z. One particular partition 
U € M_- of the data into two subsets (out of 
the 2!" possible hard partitions) is 


U=/i 11111000 0 

0000001111 
The first row of U defines point-wise 
the characteristic function for the first subset 
of Z, Ai, and the second row defines the 
characteristic function of the second subset 
of Z, A2. Each sample must be assigned ex- 
clusively to one subset (cluster) of the parti- 
tion. In this case, both the boundary point Zs 
and the outlier Z have been assigned to A1.It 
is clear that a hard partitioning may not give 
a realistic picture of the underlying data. 
Boundary data points may represent patterns 
with a mixture of properties of data in A) 
and A2, and therefore cannot be fully as- 
signed to either of these classes, or do they 
constitute a separate class. This shortcoming 
can be alleviated by using fuzzy partitions as 

shown in the following sections. 


2.2. Fuzzy Partition 


Generalization of the hard partition to 
the fuzzy case follows directly by allowing 
H,, to attain real values in [0, 1]. Conditions 
for a fuzzy partition matrix, analogous to 
(2.2) are given by (Ruspini, 1970): 

wu, €[0, 1], 1 sise 1sk=N 


aS a k=oN 


Em] 


O<> uw, =1<N,1<i<e 


im] 


(Eq: 2.5.1, 2.5.2 & 2.5.3 respectively.) 
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The i row of the fuzzy partition ma- 
trix U contains values of the 1m membership 
function of the fuzzy subset A; of Z. Equa- 
tion (2.5.2) constrains the sum of each col- 
umn to 1, and thus the total membership of 
each z in Z equals one. The fuzzy partition- 
ing space for Z 1s the set 





M 











=U ey 





N 
. My €[0,1,Vi,k30< > uy <N, Vit 
k=] 


(Eq. 2.6) 


Example 1.2: Fuzzy partition: Let us 
consider the data set from Example 1.1. One 
of the infinitely many fuzzy partitions in Z 
Is: 

Ua=/iO 10 10 08 05 05 0.2 0.0 0.0 0.0 

0.0 0.0 0.0 02 05 05 0.8 1.0 1.0 1.0 


The boundary point zs has now a 
membership degree of 0.5 in both classes, 
which correctly reflects its position in the 
middle between the two clusters. Note, how- 
ever, that the outlier Z has the same pair of 
membership degrees, even though it is fur- 
ther from the two clusters, and thus can be 
considered less typical of both A; and A2 
than zs. This is because condition (2.5.2) 
requires that the sum of memberships of 
each point equals one. It can be, of course, 
argued that three clusters are more appropri- 
ate in this example than two. In general, 
however, it 1s difficult to detect outliers and 
assign them to extra clusters. 


3. FUZZY C-MEANS CLUSTERING 


Most analytical fuzzy clustering algo- 
rithms (and also all the algorithms presented 
in this chapter) are based on optimization of 
the basic c-means objective function, or- 
some modification of it. Hence we start our 
discussion with presenting the FCM func- 
tional. 
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3.1 The Fuzzy c-Means Functional 


A large family of fuzzy clustering al- 
gorithms is based on minimization of the 
fuzzy c-means functional formulated as 
(Dunn, 1974; Bezdek, 1981): 


2 
A 


c WN 
J(Z;U,V) = >» >, y lz. -¥,| 


i=l k=l 





where 


U= | Aen | € M « 
is a fuzzy partition matrix of Z, 


V=[V1, V2, ...,Ve], Vie LI ” 


is a vector of cluster prototypes(centers), 
which have to be determined, 


= (z, -v,) A(z, -¥;) 








D xs = Iz: —V; 


l 


is a squared inner product distance norm 
where A is a norm-inducing matrix, and 


me [1, 00) 


(Eq: 3.1.1, 3.1.2, 3.1.3, 3.1.4 & 3.1.5 respec- 
tively.) 


is a parameter which determines the fuzzi- 
ness of the resulting clusters. The value of 
the cost function (8.1) can be seen as a 
measure of the total variance of z, from v,. 


3.2. The Fuzzy c-Means Algorithm 


The minimization of the c-means func- 
tional (3.1.1) represents a nonlinear optimi- 
zation problem that can be solved by using a 
variety of methods, including iterative min- 
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imization, simulated annealing or genetic 
algorithms. The most popular method is a 
simple Picard iteration through the first- 
order conditions for stationary points of 
(3.1.1), known as the FCM algorithm. 

The stationary points of the objective 
function (3.1.1) can be found by adjoining 
the constraint (2.5.2) to J by means of La- 
grange multipliers: 


J(Z:U,V,a)= SY (itg)" Din tA Lae 


i=l k=l 


(Eq: 3.2) 


and by setting the gradients of J with respect 
to U,V and to zero. It can be shown that 
D7, >0,Vi,k and m>l, then (U,V) ¢€ 
M ,.xU " may minimize if and only if 

] 


My = =~ , lS i<c,1<k<N, 
2 Dia 2Mm-l) 
N 
DA Se 
and V, = =|__:1<i<c 
> Mi) 
c=] 


(Eq: 3.3.1 & 3.3.2) 


This solution also satisfies the remain- 
ing constraints (2.5.1) and (2.5.3). Equations 
(3.3)are first-order necessary conditions for 
stationary points of the functional (3.1.1). 
The FCM (Algorithm 1.1) iterates through 
(3.3.1) and (3.3.2). Sufficiency of (3.3) and 
the convergence of the FCM algorithm is 
proven in (Bezdek, 1980). It is to be noted 
that (3.3.2) gives vias the weighted mean of 
the data items that belong to a cluster, where 
the weights are the membership degrees. 
That is why the algorithm is called “c- 
means”’. 

Algorithml1.1 Fuzzyc-means (FCM). 
Given the data set Z, choose the number of 
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clusters 1 < c < N, the weighting exponent 
m >l, the termination tolerance € >0 and 
the norm-inducing matrix A. Initialize the 
partition matrix randomly, such that U™ € 


M 


Repeat for /= 1, 2,... 


Step 1: Compute the cluster proto- 
types (means): 


Step 2: Compute the distances: 


l 


D? ng =(,-v) A(z, 0) [SiS GISKEN, 


Step 3: Update the partition matrix: 
forl1<kA<N 
if D,, >0 for alli=1, 2,...,c¢ 
cD) | 
Hi, = ec : 
Zo lis y2m-1) 


otherwise, 
u, =Oif D,,=Oand wi? €[0,1] with 


Da =]. 


untill ee | <€ 


3.3. Parameters of the FCM Algorithm 


Before using the FCM algorithm, the 
following parameters must be specified: the 
number of clusters, c, the ‘fuzziness’ expo- 
nent, m, the termination tolerance, ¢, and 
the norm-inducing matrix, A. Moreover, the 
fuzzy partition matrix, U, must be initial- 
ized. 
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3.3.1. Number of Clusters 


The number of clusters c is the most 
important parameter, in the sense that the 
remaining parameters have less influence on 
the resulting partition. When clustering real 
data without any a priori information about 
the structures in the data, one usually has to 
make assumptions about the number of un- 
derlying clusters. The chosen clustering al- 
gorithm then searches for c clusters, regard- 
less of whether they are really present in the 
data or not. 


3.3.2. Fuzziness Parameter 


The weighting exponent m is a rather 
important parameter as well, because it sig- 
nificantly influences the fuzziness of the re- 
sulting partition. As m approaches one from 
above, the partition becomes hard ( uu, € (0, 


1}) and vjare ordinary means of the clusters. 
As m — ©, the partition becomes complete- 
ly fuzzy (4, = I/c) and the cluster means 


are all equal to the mean of Z. These limit 
properties of (8) are independent of the op- 
timization method used (Pal and Bezdek, 
1995). Usually, m = 2 1s initially chosen. 


3.3.3. Termination Criterion 


The FCM algorithm stops iterating 
when the norm of the difference between U 
in two successive iterations is smaller than 
the termination parameter ¢. For the maxi- 


l (/-1) 
mum norm max, ( Mi, — Mix 





), the usual 


choice is ¢ =0.001, even though ¢= 0.01 
works well in most cases, while drastically 
reducing the computing times. 
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3.3.4. Norm-Inducing Matrix 


The shape of the clusters is determined 
by the choice of the matrix A in the distance 
measure (3.1.4). A common choice is A = I, 
which gives the standard Euclidean norm: 


D;, = (%, —v,) (%& —v;) 


3.3.5 Initial Partition Matrix 


The partition matrix is usually initial- 
ized at random, such that U € Mx. A simple 
approach to obtain such U 1s to initialize the 
cluster centers v,at random and compute the 
corresponding U by (10.1) G.e., by using the 
third step of the FCM algorithm). 


4. THE MODIFIED FUZZY C MEANS 
ALGORITHM TO FIT WITH GRAPHS 


The main challenge with adapting 
fuzzy c-means for graphs lies in creating a 
method of computing the cluster representa- 
tives. 

Let us consider a graphical dataset 


Z=(zIk=1,2,...N) 


Under fuzzy c-means the cluster cen- 
ters are computed with a weighted averaging 
that takes into account the membership val- 
ues of each data item. Thus the graph medi- 
an cannot be directly used. We propose the 
following method of determining cluster 
centers for graph-based data. For each clus- 
ter j, use deterministic sampling to compute 
the number of copies of each graph 7 to use, 
e (1), which is defined as: 
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aA ij 


Da ai; 


L 


e,@)=|n 


Here n is the total number of items in 
the data set. We then create a set of graphs 
consisting of e,(¢) copies of graph 7 and 
compute the median graph of this set to be 
the representative of cluster 7 .So the new 
algorithm becomes: 

Repeat for /= 1, 2,... 


Step 1: Compute the cluster proto- 
types (representative median of a set of 
graphs): 


[S| 
g)? = arg min cs » dist(s,G,,) 
y=l1 


VseS .s| 


where S is the set of graphs and geS 
(S = {Gi,G,..., Gn}) such that g has the low- 
est average distance to all elements in S[3] 


Step 2: Compute the distances: 


T 
D*., = (z, - ¢\°) A(z, -g;),1siselsk <N. 

where (z, — g;) is representing the dis- 
tance between the graph zx and the cluster 
representative g,, ie. dist, .(Z,,8,) (refer 
eq. 2.4). 

Step 3: Update the partition matrix: 

for | <k<N 

if D,, >0 for alli=1, 2,...,¢ 


(1) | 
My. = =< 


j=] 


otherwise, 
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u\? =Oif D,, = 0 and 


Hy,’ € [0,1] with ya? =I. 


i=l 
until lu” aye | <E 


4. CONCLUSION 


In this article, we suggested a cluster- 
ing method for graph based data with special 
reference to graphs representing web docu- 
ments. The basic idea is the calculation of 
cluster center in case of graphical objects. 
We have modified the step 1 and 2 of the 
original FCM algorithm which will arm it to 
handle graph based data. We have made 
these changes without changing the funda- 
mental concepts of the FCM algorithm. 
This method will enhance the efficiency and 
effectiveness of the FCM algorithm, as the 
graphical objects will boost the clustering 
method with abundant information [6, 7, 8]. 
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